Enumerate "Data" Big Idea from College Board

Some of the big ideas and vocab that you observe, talk about it with a partner ...

  • "Data compression is the reduction of the number of bits needed to represent data"
  • "Data compression is used to save transmission time and storage space."
  • "lossy data can reduce data but the original data is not recovered"
  • "lossless data lets you restore and recover"

The Image Lab Project contains a plethora of College Board Unit 2 data concepts. Working with Images provides many opportunities for compression and analyzing size.

Image Files and Size

Here are some Images Files. Download these files, load them into images directory under _notebooks in your Blog.

Describe some of the meta data and considerations when managing Image files. Describe how these relate to Data Compression ...

  • File Type, PNG and JPG are two types used in this lab
  • Size, height and width, number of pixels
  • Visual perception, lossy compression

Displaying images in Python Jupyter notebook

Python Libraries and Concepts used for Jupyter and Files/Directories

IPython

Support visualization of data in Jupyter notebooks. Visualization is specific to View, for the web visualization needs to be converted to HTML.

pathlib

File paths are different on Windows versus Mac and Linux. This can cause problems in a project as you work and deploy on different Operating Systems (OS's), pathlib is a solution to this problem.

  • What are commands you use in terminal to access files?

ls, cd, pwd

  • What are the command you use in Windows terminal to access files?

dir. ren, chdir

  • What are some of the major differences?

Windows vs Linux :directory listing (Windows - dir, Linux - ls), rename a file (Windows - ren, Linux - mv), returns current directory location (Windows - chdir, Linux - pwd), both change to current directory is cd Provide what you observed, struggled with, or leaned while playing with this code.

  • Why is path a big deal when working with images?

It helps to identify the location of the image file on a computer or server so that you can open, edit, and display it. A path includes the name of the directory/folder and anysubdirectories or subfolders that the file is located in. This information is important because it helps to locate the image file and retrieve it when needed.

  • How does the meta data source and label relate to Unit 5 topics?

Unit 5 is where students learn about the development of computer programs that process and analyze data. This includes working with text, images, and video, and understanding how to access and manipulate that data. When it comes to images, metadata refers to the information embedded in an image file, like the date/time the image was taken, the camera settings used to capture the image, and the location where the image was taken. This data can be accessed and analyzed using programming techniques. Labeling is when you assign descriptive tags/labels to images to help identify and classify them. This is typically used in ML and AI where image recognition can classify and identity different objects. Like a robot being able to tell the difference between a cat and a dog.

  • Look up IPython, describe why this is interesting in Jupyter Notebooks for both Pandas and Images?

IPython (Interactive Python) is a command shell for interactive computing. It is now known as Jupyter Notebook. Jupyter Notebook is a preferable platform for pandas and images because it is a way to put data, code, visualizations, and documentation all in one single notebook. You can store massive amounts of metadata from images and format it into pandas dataframe to later change it

from IPython.display import Image, display
from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f

# prepares a series of images
def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
            {'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"},
            {'source': "Emma Shen", 'label': "Smiley Face", 'file': "smileyface.png"}
        ]
    for image in images:
        # File to open
        image['filename'] = path / image['file']  # file with path
    return images

def image_display(images):
    for image in images:  
        display(Image(filename=image['filename']))


# Run this as standalone tester to see sample data printed in Jupyter terminal
if __name__ == "__main__":
    # print parameter supplied image
    green_square = image_data(images=[{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"}])
    image_display(green_square)
    
    # display default images from image_data()
    default_images = image_data()
    image_display(default_images)
    

Reading and Encoding Images (2 implementations follow)

PIL (Python Image Library)

Pillow or PIL provides the ability to work with images in Python. Geeks for Geeks shows some ideas on working with images.

base64

Image formats (JPG, PNG) are often called *Binary File formats, it is difficult to pass these over HTTP. Thus, base64 converts binary encoded data (8-bit, ASCII/Unicode) into a text encoded scheme (24 bits, 6-bit Base64 digits). Thus base64 is used to transport and embed binary images into textual assets such as HTML and CSS.

  • How is Base64 similar or different to Binary and Hexadecimal?

Binary is most basic representation of data using only 0 and 1 to represent each bit of information while Hexadecimal is a base-16 numbering system that uses 16 digits (0-9 and A-f) to represent a byte of information. They are both numerical systems to represent data in computers. However, Base64 is a method for encoding binary data using 64 characters that are safe for use in email and other text-based communication channels. It has the most variety in being able to store numbers, letters, characters, AND symbols. It works by breaking the input data into blocks of three bytes, and then encoding each block as four characters from a pre-defined set of 64 characters. The resulting encoded text can then be sent as ASCII text.

  • Translate first 3 letters of your name to Base64:

emm --> ZW1t

numpy

Numpy is described as "The fundamental package for scientific computing with Python". In the Image Lab, a Numpy array is created from the image data in order to simplify access and change to the RGB values of the pixels, converting pixels to grey scale.

io, BytesIO

Input and Output (I/O) is a fundamental of all Computer Programming. Input/output (I/O) buffering is a technique used to optimize I/O operations. In large quantities of data, how many frames of input the server currently has queued is the buffer. In this example, there is a very large picture that lags.

  • Where have you been a consumer of buffering?

I see buffering/loading screens everywhere on my phone and computer. To make up for loading time and to process the input and prepare for display of output, buffering screens are important to keep user entertained and know that request is still processing. Below are two ways that Synergy StudentVue uses buffering

  • From your consumer experience, what effects have you experienced from buffering?

The more data is being loaded the longer the buffer is

  • How do these effects apply to images?

Larger images take more time to load. For lossy files they take a short time because the image is compressed to load faster while for lossless files they save original data and preserve the quality which causes it to load for longer time

Data Structures, Imperative Programming Style, and working with Images

Introduction to creating meta data and manipulating images. Look at each procedure and explain the the purpose and results of this program. Add any insights or challenges as you explored this program.

Manipulate images to different colors / filters

  • Does this code seem like a series of steps are being performed?

Yes like imperative programming sequence

  • Describe Grey Scale algorithm in English or Pseudo code?</li> </ul>
    1. Initialize a new image object with the same dimensions as the input image.
    2. For each pixel in the input image:a. Retrieve the red, green, and blue values of the pixel. b. Calculate the average of the red, green, and blue values. c. Set the red, green, and blue values of the pixel in the new image object to the average value.
    3. Return the new image object.</b>
    • Describe scale image? What is before and after on pixels in three images? -->

    Scaling an image means changing its size, either making it larger or smaller. When an image is scaled, each pixel in the image is adjusted to a new position and/or color value, in order to create a new image with the desired size.</p>

    Before scaling, each pixel in the original image has a specific position (x,y) and color value (RGB).

    After scaling, the position and color value of each pixel in the scaled image will be different from its position and color value in the original image. </b>

    • Is scale image a type of compression? If so, line it up with College Board terms described? --> </li> </ul>

      Scaling an image can be considered a type of compression, as it reduces the amount of data required to represent the image by changing its size.

      For College Board AP CSP, it describes compression as "reducing the number of bits used to represent data in a file." This usually involves using techniques such as lossless compression algorithms, which exploit redundancy in the data to reduce its size without losing any information, or lossy compression algorithms, which selectively discard some information in order to achieve greater compression.

      Scaling an image, on the other hand, only changes the size and resolution of the image, without necessarily reducing the number of bits used to represent the data in the file. </b>

      </div> </div> </div>
      from IPython.display import HTML, display
      from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
      from PIL import Image as pilImage # as pilImage is used to avoid conflicts
      from io import BytesIO
      import base64
      import numpy as np
      
      # prepares a series of images
      def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
          if images is None:  # default image
              images = [
                  {'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
                  {'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
                  {'source': "Emma Shen", 'label': "pretty", 'file': "mountain.png"}
              ]
          for image in images:
              # File to open
              image['filename'] = path / image['file']  # file with path
          return images
      
      # Large image scaled to baseWidth of 320
      def scale_image(img):
          baseWidth = 320
          scalePercent = (baseWidth/float(img.size[0]))
          scaleHeight = int((float(img.size[1])*float(scalePercent)))
          scale = (baseWidth, scaleHeight)
          return img.resize(scale)
      
      # PIL image converted to base64
      def image_to_base64(img, format):
          with BytesIO() as buffer:
              img.save(buffer, format)
              return base64.b64encode(buffer.getvalue()).decode()
      
      # Set Properties of Image, Scale, and convert to Base64
      def image_management(image):  # path of static images is defaulted        
          # Image open return PIL image object
          img = pilImage.open(image['filename'])
          
          # Python Image Library operations
          image['format'] = img.format
          image['mode'] = img.mode
          image['size'] = img.size
          # Scale the Image
          img = scale_image(img)
          image['pil'] = img
          image['scaled_size'] = img.size
          # Scaled HTML
          image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
          
      # Create Grey Scale Base64 representation of Image
      def image_management_add_html_grey(image):
          # Image open return PIL image object
          img = image['pil']
          format = image['format']
          
          img_data = img.getdata()  # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
          image['data'] = np.array(img_data) # PIL image to numpy array
          image['gray_data'] = [] # key/value for data converted to gray scale
      
          # 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
          for pixel in image['data']:
              # create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
              average = (pixel[0] + pixel[1] + pixel[2]) // 3  # average pixel values and use // for integer division
              if len(pixel) > 3:
                  image['gray_data'].append((average, average, average, pixel[3])) # PNG format
              else:
                  image['gray_data'].append((average, average, average))
              # end for loop for pixels
              
          img.putdata(image['gray_data'])
          image['html_grey'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)
      
      
      # Jupyter Notebook Visualization of Images
      if __name__ == "__main__":
          # Use numpy to concatenate two arrays
          images = image_data()
          
          # Display meta data, scaled view, and grey scale for each image
          for image in images:
              image_management(image)
              print("---- meta data -----")
              print(image['label'])
              print(image['source'])
              print(image['format'])
              print(image['mode'])
              print("Original size: ", image['size'])
              print("Scaled size: ", image['scaled_size'])
              
              print("-- original image --")
              display(HTML(image['html'])) 
              
              print("--- grey image ----")
              image_management_add_html_grey(image)
              display(HTML(image['html_grey'])) 
          print()
      
      ---- meta data -----
      Green Square
      Internet
      PNG
      RGBA
      Original size:  (16, 16)
      Scaled size:  (320, 320)
      -- original image --
      
      --- grey image ----
      
      ---- meta data -----
      Clouds Impression
      Peter Carolin
      PNG
      RGBA
      Original size:  (320, 234)
      Scaled size:  (320, 234)
      -- original image --
      
      --- grey image ----
      
      ---- meta data -----
      pretty
      Emma Shen
      PNG
      RGB
      Original size:  (1048, 796)
      Scaled size:  (320, 243)
      -- original image --
      
      --- grey image ----
      
      
      

      Data Structures and OOP

      Most data structures classes require Object Oriented Programming (OOP). Since this class is lined up with a College Course, OOP will be talked about often. Functionality in remainder of this Blog is the same as the prior implementation. Highlight some of the key difference you see between imperative and oop styles.

      • Read imperative and object-oriented programming on Wikipedia
      • Consider how data is organized in two examples, in relations to procedures
      • Look at Parameters in Imperative and Self in OOP

      Additionally, review all the imports in these three demos. Create a definition of their purpose, specifically these ...

      • PIL
      • numpy
      • base64
      from IPython.display import HTML, display
      from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
      from PIL import Image as pilImage # as pilImage is used to avoid conflicts
      from io import BytesIO
      import base64
      import numpy as np
      
      
      class Image_Data:
      
          def __init__(self, source, label, file, path, baseWidth=320):
              self._source = source    # variables with self prefix become part of the object, 
              self._label = label
              self._file = file
              self._filename = path / file  # file with path
              self._baseWidth = baseWidth
      
              # Open image and scale to needs
              self._img = pilImage.open(self._filename)
              self._format = self._img.format
              self._mode = self._img.mode
              self._originalSize = self.img.size
              self.scale_image()
              self._html = self.image_to_html(self._img)
              self._html_grey = self.image_to_html_grey()
      
      
          @property
          def source(self):
              return self._source  
          
          @property
          def label(self):
              return self._label 
          
          @property
          def file(self):
              return self._file   
          
          @property
          def filename(self):
              return self._filename   
          
          @property
          def img(self):
              return self._img
                   
          @property
          def format(self):
              return self._format
          
          @property
          def mode(self):
              return self._mode
          
          @property
          def originalSize(self):
              return self._originalSize
          
          @property
          def size(self):
              return self._img.size
          
          @property
          def html(self):
              return self._html
          
          @property
          def html_grey(self):
              return self._html_grey
              
          # Large image scaled to baseWidth of 320
          def scale_image(self):
              scalePercent = (self._baseWidth/float(self._img.size[0]))
              scaleHeight = int((float(self._img.size[1])*float(scalePercent)))
              scale = (self._baseWidth, scaleHeight)
              self._img = self._img.resize(scale)
          
          # PIL image converted to base64
          def image_to_html(self, img):
              with BytesIO() as buffer:
                  img.save(buffer, self._format)
                  return '<img src="data:image/png;base64,%s">' % base64.b64encode(buffer.getvalue()).decode()
                  
          # Create Grey Scale Base64 representation of Image
          def image_to_html_grey(self):
              img_grey = self._img
              numpy = np.array(self._img.getdata()) # PIL image to numpy array
              
              grey_data = [] # key/value for data converted to gray scale
              # 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
              for pixel in numpy:
                  # create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
                  average = (pixel[0] + pixel[1] + pixel[2]) // 3  # average pixel values and use // for integer division
                  if len(pixel) > 3:
                      grey_data.append((average, average, average, pixel[3])) # PNG format
                  else:
                      grey_data.append((average, average, average))
                  # end for loop for pixels
                  
              img_grey.putdata(grey_data)
              return self.image_to_html(img_grey)
      
              
      # prepares a series of images, provides expectation for required contents
      def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
          if images is None:  # default image
              images = [
                  {'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
                  {'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
                  {'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"},
                  {'source': "Emma Shen", 'label': "pretty", 'file': "mountain.png"}
              ]
          return path, images
      
      # turns data into objects
      def image_objects():        
          id_Objects = []
          path, images = image_data()
          for image in images:
              id_Objects.append(Image_Data(source=image['source'], 
                                        label=image['label'],
                                        file=image['file'],
                                        path=path,
                                        ))
          return id_Objects
      
      # Jupyter Notebook Visualization of Images
      if __name__ == "__main__":
          for ido in image_objects(): # ido is an Imaged Data Object
              
              print("---- meta data -----")
              print(ido.label)
              print(ido.source)
              print(ido.file)
              print(ido.format)
              print(ido.mode)
              print("Original size: ", ido.originalSize)
              print("Scaled size: ", ido.size)
              
              print("-- scaled image --")
              display(HTML(ido.html))
              
              print("--- grey image ---")
              display(HTML(ido.html_grey))
              
          print()
      
      ---- meta data -----
      Green Square
      Internet
      green-square-16.png
      PNG
      RGBA
      Original size:  (16, 16)
      Scaled size:  (320, 320)
      -- scaled image --
      
      --- grey image ---
      
      ---- meta data -----
      Clouds Impression
      Peter Carolin
      clouds-impression.png
      PNG
      RGBA
      Original size:  (320, 234)
      Scaled size:  (320, 234)
      -- scaled image --
      
      --- grey image ---
      
      ---- meta data -----
      Lassen Volcano
      Peter Carolin
      lassen-volcano.jpg
      JPEG
      RGB
      Original size:  (2792, 2094)
      Scaled size:  (320, 240)
      -- scaled image --
      
      --- grey image ---
      
      ---- meta data -----
      pretty
      Emma Shen
      mountain.png
      PNG
      RGB
      Original size:  (1048, 796)
      Scaled size:  (320, 243)
      -- scaled image --
      
      --- grey image ---
      
      
      

      Hacks!!!

      1. College Board Practice

      Data Compression Quiz 3/3

      Extracting info from Data Quiz 5/6

      5 Mistake

      Explanation: The data is determined only by image recognition by a camera. Thus, the number of bicyles passed on a particular day can be recorded (Answer Choice D). However, the calculation of a AVERAGE speed would need metadata that includes the cars distance and time.

      Using Programs with Data Quiz 5/6

      4 Mistake

      Explanation: The key to these types of questions is that I need to select TWO correct answers. Here, I chose Answer C which is wrong because this sequence of steps does not remove any entries with an unknown year, so the entry in the first row of the spreadsheet will have a year value of -1. I should've chosen Answer D instead because sorting by year will sort the spreadsheet on column C from least to greatest. Filtering by year will remove any entries with unknown years. Filtering by photographer will remove any entries with unknown photographers. Since the order of the entries is not affected by the filters, the photograph with the lowest year value will be in the first row of the spreadsheet.

      Binary Quiz

      2. Lossy vs Lossless Image

      1. Lossless - compression in which the image is reduced without any quality loss Ex. A logo that can be produced in multiple places in the website and sizes

      1. Lossy - a process that removes some of the data from your image file, reducing the overall file size. This process is irreversible, meaning that the file information will be removed permanently.

      Ex. Sending an image file to your friend over email and it asks you whether you want to send as small, medium, large, or actual size. In this situation, you can reduce the size/quality of the picture to make the file send faster.

      3. Programming paradigm

      • Numpy, manipulating pixels. As opposed to Grey Scale treatment, pick a couple of other types like red scale, green scale, or blue scale. We want you to be manipulating pixels in the image.

      • Binary and Hexadecimal reports. Convert and produce pixels in binary and Hexadecimal and display.

      • Compression and Sizing of images. Look for insights into compression Lossy and Lossless. Look at PIL library and see if there are other things that can be done.

      • There are many effects you can do as well with PIL. Blur the image or write Meta Data on screen, aka Title, Author and Image size.

      import numpy as np
      from PIL import Image
      
      # Load the image
      image = Image.open('images/bat.png')
      
      # Add a title to the image
      image.info['Title'] = 'Redscaled'
      
      # Convert the image to a NumPy array
      img_array = np.asarray(image)
      
      # Convert the array to binary representation
      binary_pixels = np.unpackbits(img_array, axis=-1)
      
      # Convert the binary representation to hexadecimal
      hex_pixels = np.apply_along_axis(lambda x: hex(int(''.join(map(str, x)), 2))[2:].zfill(2), -1, binary_pixels)
      
      # Display the binary and hexadecimal pixels
      #print("Binary pixels:\n", binary_pixels)
      #print("Hexadecimal pixels:\n", hex_pixels)
      
      # Create a copy of the array
      red_img = np.copy(img_array)
      
      # Set the green and blue channels to 0, leaving only the red channel
      red_img[:, :, 1] = 0
      red_img[:, :, 2] = 0
      
      # Convert the NumPy array back to an image
      red_image = Image.fromarray(red_img)
      
      # Save the red-scale image
      red_image.save('images/bat.png')
      
      # Resize the image to half its original size
      resized_image = red_image.resize((red_image.width // 2, red_image.height // 2))
      
      print(image.info)
      
      
      # Show the resized image
      resized_image.show()
      
      {'srgb': 0, 'gamma': 0.45455, 'dpi': (143.99259999999998, 143.99259999999998), 'Title': 'Redscaled'}
      
      </div>